An introduction to open-source R software for wearable device data processing and analysis

Brian C. Helsel, PhD
Assistant Professor

KU Alzheimer’s Disease Research Center
Department of Neurology | The University of Kansas Medical Center

Introduction

  • R is a functional programming language for statistical computing and graphics.1
  • RStudio is an integrated development environment for R
  • There are >20,000 available packages on CRAN extending the utility of R
  • This presentation was created in Quarto using R

Why R?

  • Many open-source physical activity packages available on CRAN or GitHub
  • Open-source code free to view on GitHub
  • Reticulate is a Python-R interface that makes Python packages available to R users
  • Customized scripts or packages can meet the needs of a research study.

Objectives

The purpose of today’s presentation is to:

  1. Introduce you to R packages used in physical activity research
  2. Provide examples with code for how physical activity data can be processed, analyzed, and visualized in R.

Getting Started

# Install the devtools package from CRAN
install.packages("devtools")

Download the Presentation

Install this presentation into RStudio as an R package

# Install the presentationsR package
options(timeout = 999999) # Increase to avoid timeout issues
devtools::install_github("bhelsel/presentationsR")
presentationsR::render_presentation(outdir = "/full/path/to/folder", name = "csacsm24")

Visit GitHub to download as a compressed .zip file

https://github.com/bhelsel/presentationsR

Wearable Devices

  • Accelerometry was first introduced in the 1980’s2
  • Over the past ~40 years:
    • The physical size of portable accelerometers is smaller
    • More onboard data storage capacity
    • Refined data collection and processing protocols
    • Open-source software is becoming increasingly prevalent

Open-source Software

  • Has an important role in advancing scientific reproducibility and rigor
  • Can be developed in a collaborative public fashion
  • Software needs to be findable, accessible, interoperable, and reusable
  • Many scientists who develop or use open-source software are self-taught
  • Comprehensive, didactic documentation and training opportunities are important

A Basic Accelerometry Workflow

  • Read in the acceleration data from ActiGraph .gt3x and .agd files
  • Convert the raw acceleration data to ActiGraph counts
  • Identify non-wear time using the Choi algorithm
  • Apply cut-points and separate acceleration data into intensity bands3
  • Create compositional variables for analysis and visualizations

Read in the raw acceleration data

# Binary .gt3x files can be read with the read.gt3x pacakge
gt3xFile <- system.file("extdata/example3min.gt3x", package = "presentationsR")
gt3xData <- read.gt3x::read.gt3x(gt3xFile, asDataFrame = TRUE)
head(gt3xData)
Sampling Rate: 100Hz
Firmware Version: 1.7.2
Serial Number Prefix: TAS
                 time      X     Y     Z
1 2023-06-13 08:34:00 -0.020 0.008 1.031
2 2023-06-13 08:34:00  0.000 0.004 1.035
3 2023-06-13 08:34:00  0.008 0.004 1.031
4 2023-06-13 08:34:00  0.008 0.004 1.035
5 2023-06-13 08:34:00  0.008 0.004 1.031
6 2023-06-13 08:34:00  0.016 0.008 1.027
sprintf("Data has %s rows and %s columns", nrow(gt3xData), ncol(gt3xData))
[1] "Data has 18000 rows and 4 columns"

Example Output

Using Reticulate

ActiGraph has the pygt3x file reader in a Python package https://github.com/actigraph/pygt3x which has subtle differences from the read.gt3x R package.

if(reticulate::py_module_available("pygt3x")){
  `%as%` <- reticulate::`%as%`
  Reader <- reticulate::import("pygt3x.reader", convert = FALSE)
  pd <- reticulate::import("pandas")
  reset_index <- pd$core$frame$DataFrame$reset_index
  FileReader <- Reader$FileReader
  to_pandas <- FileReader$to_pandas
  with(FileReader(gt3xFile) %as% reader, {
    gt3xDataPy = reset_index(to_pandas(reader))
  })
  colnames(gt3xDataPy) <- c("time", "X", "Y", "Z")
  gt3xDataPy$time <- as.POSIXct(
    gt3xDataPy$time, origin = "1970-01-01 00:00:00", tz = "UTC"
    )
}

Converting to ActiGraph Counts

  • ActiGraph’s count algorithm was made publicly available in 2022.4
  • The agcounts Python module was converted to an R package in 2022 and is available on GitHub https://github.com/bhelsel/agcounts.
# ::: allows you to call hidden and non-exported functions
sf <- agcounts:::.get_frequency(gt3xData)
resampled_data <- agcounts:::.resample(gt3xData, frequency = sf)
filtered_data <- agcounts:::.bpf_filter(resampled_data)
trimmed_data <- agcounts:::.trim_data(filtered_data, lfe_select = FALSE)
resampled_10Hz_data <- agcounts:::.resample_10hz(trimmed_data)
epoch_counts <- agcounts:::.sum_counts(resampled_data, epoch = 60)

The Power of Functions

.bpf_filter <- function(downsample_data, verbose = FALSE){
  if(verbose){
    print("Filtering Data")
  }
  a = as.numeric(.coefficients$output_coefficients)
  b = as.numeric(.coefficients$input_coefficients)
  zi <- gsignal::filter_zi(filt = b, a = a)
  zi <- matrix(rep(zi, 3), nrow = 3, byrow = 3) * downsample_data[, 1]
  rownames(zi) <- c("X", "Y", "Z")
  bpf_data_x <- gsignal::filter(filt = b, a = a, x = downsample_data["X", ], zi = zi["X", ])
  bpf_data_y <- gsignal::filter(filt = b, a = a, x = downsample_data["Y", ], zi = zi["Y", ])
  bpf_data_z <- gsignal::filter(filt = b, a = a, x = downsample_data["Z", ], zi = zi["Z", ])
  bpf_data <- matrix(rbind(bpf_data_x$y, bpf_data_y$y, bpf_data_z$y), nrow = 3)
  bpf_data = ((3.0 / 4096.0) / (2.6 / 256.0) * 237.5) * bpf_data # 17.127404 is used in ActiLife and 17.128125 is used in firmware.
  rownames(bpf_data) <- c("X", "Y", "Z")
  return(bpf_data)
}

A Wrapper Function

A function with a main purpose of calling secondary functions or code to ease implementation.

agcounts::get_counts(gt3xFile, epoch = 30, lfe_select = TRUE, parser = "read.gt3x")
                 time Axis1 Axis2 Axis3 Vector.Magnitude
1 2023-06-13 08:34:00  1723  1384  2206             3123
2 2023-06-13 08:34:30  1032  1842  1455             2564
3 2023-06-13 08:35:00  1055  2074  1420             2726
4 2023-06-13 08:35:30   871  1974  1520             2639
5 2023-06-13 08:36:00   924  1717  1278             2331
6 2023-06-13 08:36:30  1377  1764  1480             2683
agcounts::get_counts(gt3xFile, epoch = 60, lfe_select = FALSE, parser = "read.gt3x")
                 time Axis1 Axis2 Axis3 Vector.Magnitude
1 2023-06-13 08:34:00  2606  3116  3542             5389
2 2023-06-13 08:35:00  1738  3943  2840             5161
3 2023-06-13 08:36:00  2172  3363  2646             4799

SQL Database

  • ActiGraph’s .agd files are SQL databases
  • Data from a SQL database is loaded with the DBI package
agdFile <- system.file("extdata/example20hours1sec.agd", package = "presentationsR")
# Open the SQL database connection
con <- DBI::dbConnect(RSQLite::SQLite(), agdFile)
# Read the data into R from the SQL database
agdData <- DBI::dbReadTable(con, "data")
# Close the SQL database
DBI::dbDisconnect(con)
                 time axis1 axis2 axis3   vm
1 2023-04-24 15:02:00  1235  1646  1550 3079
2 2023-04-24 15:03:00   808   931  1102 1846
3 2023-04-24 15:04:00   361   410  1437 1628
4 2023-04-24 15:05:00  1060  1004  1927 2639
5 2023-04-24 15:06:00   697  1079   755 1700
6 2023-04-24 15:07:00   126   249   482  585

Wear Time Marking

Choi et al. (2011)5

agdData <- 
  agdData |>
  PhysicalActivity::wearingMarking(
    frame = 90,
    perMinuteCts = 1, 
    TS = "time",
    cts = "vm",
    allowanceFrame = 2,
    newcolname = "wear"
  )
head(agdData)
                 time axis1 axis2 axis3   vm wear weekday days
1 2023-04-24 15:02:00  1235  1646  1550 3079    w  Monday    1
2 2023-04-24 15:03:00   808   931  1102 1846    w  Monday    1
3 2023-04-24 15:04:00   361   410  1437 1628    w  Monday    1
4 2023-04-24 15:05:00  1060  1004  1927 2639    w  Monday    1
5 2023-04-24 15:06:00   697  1079   755 1700    w  Monday    1
6 2023-04-24 15:07:00   126   249   482  585    w  Monday    1

Cut-points and Intensity Bands

Montoye et al (2020)6

classified_data <- 
  agdData |>
  dplyr::mutate(intensity = cut(
    vm, breaks = c(0, 2860, 3941, Inf), # Montoye Wrist
    labels = c("Sedentary", "Light", "MVPA"),
    include.lowest = TRUE, right = FALSE)) |>
  dplyr::mutate(intensity_bands = cut(
    vm, breaks = seq(0, 10500, 1500),
    labels = c("0_1500", "1501_3000", "3001_4500", "4501_6000", 
               "6001_7500", "7501_9000", "> 9000"),
    include.lowest = TRUE, right = FALSE))

head(classified_data[, -c(2:4)])
                 time   vm wear weekday days intensity intensity_bands
1 2023-04-24 15:02:00 3079    w  Monday    1     Light       3001_4500
2 2023-04-24 15:03:00 1846    w  Monday    1 Sedentary       1501_3000
3 2023-04-24 15:04:00 1628    w  Monday    1 Sedentary       1501_3000
4 2023-04-24 15:05:00 2639    w  Monday    1 Sedentary       1501_3000
5 2023-04-24 15:06:00 1700    w  Monday    1 Sedentary       1501_3000
6 2023-04-24 15:07:00  585    w  Monday    1 Sedentary          0_1500

Aggregating Intensity Data

intensity_binary <- cbind(
  classified_data, 
  sjmisc::to_dummy(classified_data$intensity)
  )

agdDataHour <- 
  intensity_binary |>
  dplyr::group_by(time = format(time, "%Y-%m-%d %H")) |>
  dplyr::summarise_at(dplyr::vars(intensity_1:intensity_3), sum) |>
  `colnames<-`(c("time", "Sedentary", "Light", "MVPA"))

head(agdDataHour)
# A tibble: 6 × 4
  time          Sedentary Light  MVPA
  <chr>             <dbl> <dbl> <dbl>
1 2023-04-24 15        48     5     5
2 2023-04-24 16        51     5     4
3 2023-04-24 17        27     7    26
4 2023-04-24 18        34    17     9
5 2023-04-24 19        49     5     6
6 2023-04-24 20        35     9    16

Compositional Data

# Change to composition data
comp <- compositions::acomp(agdDataHour[, 2:4])
head(round(comp, 3))
     Sedentary Light  MVPA
[1,]     0.828 0.086 0.086
[2,]     0.850 0.083 0.067
[3,]     0.450 0.117 0.433
[4,]     0.567 0.283 0.150
[5,]     0.817 0.083 0.100
[6,]     0.583 0.150 0.267
# Calculate geometric means using 60 minutes as the total
unname(compositions::clo(mean(comp, robust = FALSE), total = 60))
[1] 50.667662  4.170343  5.161995

Analysis: Isometric Log Ratio (ILR) Transformations

Van den Boogaart et al. (2008) Compositions: A unified R package to analyze compositional data7

# Sequential Binary Partition
sbp <- matrix(
  c(-1, -1,
    -1, +1,
    +1,  0),
  byrow = TRUE,
  ncol = 2)
# Base of the centered log ratio (CLR) plane
psi <- compositions::gsi.buildilrBase(sbp)
# Isometric log-ratio (ILR) transformed compositions
ilr_transformed <- compositions::ilr(comp, V = psi)
head(ilr_transformed, 5)
           [,1]       [,2]
[1,] -0.9233609 -1.5993080
[2,] -1.1303068 -1.6421761
[3,]  0.5202904 -0.9545423
[4,] -0.8022588 -0.4901291
[5,] -0.7829138 -1.6138881

Behind the Scenes: The ILR Function

isPos = (sbp > 0)
isNeg = (sbp < 0)
m <- matrix(1, nrow(sbp), nrow(sbp))
nPos = m %*% isPos
nNeg = m %*% isNeg
sbp_t = (isPos * nNeg - isNeg * nPos)
nn = sapply(1:ncol(sbp_t), function(i) {
  1/sqrt(sbp_t[, i] %*% sbp_t[, i])
})
nn = matrix(nn, ncol = ncol(sbp_t), nrow = nrow(sbp_t), byrow = TRUE)
psi = sbp_t * nn
LOG <- unclass(log(ifelse(comp > 0, comp, 1)))
clr <- ifelse(comp > 0, LOG - rowSums(LOG)/rowSums(comp > 0), 0)
ilr <- clr %*% psi
all(ilr == ilr_transformed)
[1] TRUE

Analysis: An Alternative ILR Transformation

\[ z1 = \sqrt{\frac{2}{3}} \cdot \ln\left(\frac{MVPA}{\sqrt[2]{SB \cdot LPA}}\right) \]

\[ z2 = \sqrt{\frac{1}{2}} \cdot \ln\left(\frac{LPA}{\sqrt[1]{SB}}\right) \]

Templ et al. (2011) robCompositions: An R-package for Robust Statistical Analysis of Compositional Data8

X <- comp[, c("MVPA", "Light", "Sedentary")]
utils::head(robCompositions::pivotCoord(X))
  MVPA_Li-Se   Light_Se
1 -0.9969690 -1.5993080
2 -1.1866392 -1.6421761
3  0.0565334 -0.9545423
4 -0.9349550 -0.4901291
5 -0.8689403 -1.6138881
6 -0.3379092 -0.9603383
all.equal(MoveKC::ilr_pivotCoord(X), robCompositions::pivotCoord(X))
[1] TRUE

Next Steps

  • Visualize the data using ternary plots
  • Use different multivariate tools
  • Raw acceleration data
  • Explore isotemporal substitution models to determine the effects of time-use reallocation

Fitbit API

  • Fitbit provides access to user data via their application programming interface (API)
  • Third-party companies exist to make access to the Fitbit API easier for investigators
    • Pilot studies lack the budget to pay for the service
    • Participant data is linked to another platform
  • Tools in R make it possible to retrieve Fitbit data and it can be automated!

iFibit

  • We created iFitbit to simplify working with the Fitbit API
  • It allows you to extract activity, exercise, sleep, and heart rate data to:
    • Store in a SQL database
    • Run interactive reports

To install:

devtools::install_github("bhelsel/iFitbit", build_vignettes = TRUE)
vignette("fitbit-application", package = "iFitbit")

Example Data from the Fitbit API

https://dev.fitbit.com/build/reference/web-api/

fb_file <- system.file("extdata/exampleFitbit.db", package = "presentationsR")
con <- DBI::dbConnect(RSQLite::SQLite(), fb_file)
tbl_n <- DBI::dbListTables(con) # List table names
data <- lapply(tbl_n, function(x) DBI::dbReadTable(con, name = x))
names(data) <- tbl_n
tbl_n
[1] "activities"   "device"       "exercise_log" "heart"        "sleep"       

Activities

date distance elevation floors minutesSedentary minutesLightlyActive minutesFairlyActive minutesVeryActive steps
2022-02-28 1.840 6 6 1169 206 45 20 9412
2022-03-01 2.397 22 4 1319 86 26 9 8150
2022-03-02 2.028 12 6 1240 161 34 5 8496
2022-03-03 2.565 10 1 1249 124 52 15 6076
2022-03-04 1.197 28 4 1192 206 25 17 10728
2022-03-05 0.820 10 3 1190 212 21 17 9668
2022-03-06 2.392 5 2 1226 175 35 4 8579
2022-03-07 2.382 4 0 1256 161 18 5 8543
2022-03-08 0.733 4 2 1257 147 24 12 7326
2022-03-09 1.245 33 6 1297 113 20 10 5612

Exercise Log

date time type duration steps calories hr
2022-02-28 20:45:15 Swim 29.22 8 527 80
2022-03-01 02:35:04 Swim 12.41 15 767 123
2022-03-02 02:19:55 Swim 35.55 8 531 91
2022-03-03 00:20:32 Sport 15.21 334 669 99
2022-03-04 16:11:54 Walk 11.26 734 494 94
2022-03-05 03:21:53 Yoga 28.87 28 413 121
2022-03-06 22:04:07 Strength 13.84 321 554 112
2022-03-07 03:49:23 Workout 10.49 421 568 98
2022-03-08 10:26:17 Aerobic Workout 26.82 644 555 86
2022-03-09 19:34:00 Sport 29.74 148 608 115

Sleep and Intensity by Heart Rate

date sleep nonwear sedentary very.light light moderate vigorous near.maximal mvpa
2022-02-28 397 563 519 174 152 22 4 6 32
2022-03-01 401 620 507 138 137 22 11 5 38
2022-03-02 447 613 518 153 120 22 8 6 36
2022-03-03 391 648 457 181 120 16 15 3 34
2022-03-04 421 653 410 209 124 25 14 5 44
2022-03-05 359 485 588 201 128 23 10 5 38
2022-03-06 445 576 469 184 149 50 7 5 62
2022-03-07 407 571 541 152 140 26 8 2 36
2022-03-08 399 543 586 157 111 37 4 2 43
2022-03-09 372 557 491 205 131 36 14 6 56

Heart Rate Intraday Function

get_fitbit_heart_intraday(
  token.pathname,
  start.date = "2022-02-28",
  end.date = "2022-03-20",
  detail.level = "1min",
  age = 25,
  max_hr_formula = "220 - age",
  intensity = c(0, 0.2, 0.3, 0.4, 0.6, 0.9, 1),
  intensity_labels = c("sedentary", "very.light", "light", "moderate", "vigorous", "near.maximal"),
  heart_rate_method = 2,
  rest.hr = 60,
  do.parallel = TRUE,
  returnData = FALSE,
  returnRawData = FALSE,
  toSQL = TRUE,
  verbose = TRUE
)

Intraday Visualization of Heart Rate Data

Interday Visualization of Heart Rate Data

Example Report

Automating the Process

cmd <- cronR::cron_rscript("pathname/to/script.R", log_append = FALSE)
cronR::cron_add(
  command = cmd,
  frequency = '0 9 * * 1-5', # 9 am Monday-Friday
  id = 'iFitbitReport',
  user = "username",
  description = 'Extracts data from the Fitbit API and runs interactive reports',
  tags = c('fitbit', 'interactive', 'exercise', 'sleep'))

cronR::cron_ls("iFitbitReport") # List cron jobs
cronR::cron_rm('iFitbitReport') # Remove cron job

Conclusions

  • Many scientists view software as an important part of their research
  • Scientists developing software may benefit from open-source resources like GitHub to:
    • Enhances the learning process by allowing others to inspect, modify, and enhance their code
    • Increases reproducibility and drives innovation
  • Customized software for wearable devices can save time and increase collaboration in physical behavior research

References

1.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2021. https://www.R-project.org/
2.
Montoye HJ, Washburn R, Servais S, Ertl A, Webster JG, Nagle FJ. Estimation of energy expenditure by a portable accelerometer. Medicine and science in sports and exercise. 1983;15(5):403-407.
3.
Migueles JH, Aadland E, Andersen LB, et al. GRANADA consensus on analytical approaches to assess associations with accelerometer-determined physical behaviours (physical activity, sedentary behaviour and sleep) in epidemiological studies. British journal of sports medicine. 2022;56(7):376-384.
4.
Neishabouri A, Nguyen J, Samuelsson J, et al. Quantification of acceleration as activity counts in ActiGraph wearable. Scientific Reports. 2022;12(1):11958.
5.
Choi L, Liu Z, Matthews CE, Buchowski MS. Validation of accelerometer wear and nonwear time classification algorithm. Medicine and science in sports and exercise. 2011;43(2):357.
6.
Montoye AH, Clevenger KA, Pfeiffer KA, et al. Development of cut-points for determining activity intensity from a wrist-worn ActiGraph accelerometer in free-living adults. Journal of Sports Sciences. 2020;38(22):2569-2578.
7.
Van den Boogaart KG, Tolosana-Delgado R. “Compositions”: A unified r package to analyze compositional data. Computers & Geosciences. 2008;34(4):320-338.
8.
Templ M, Hron K, Filzmoser P. robCompositions: An r-package for robust statistical analysis of compositional data. Compositional data analysis: Theory and applications. Published online 2011:341-355.